Evaluation Measures for Multi-class Subgroup Discovery

نویسندگان

  • Tarek Abudawood
  • Peter A. Flach
چکیده

Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. It has previously predominantly been investigated in a two-class context. This paper investigates multi-class subgroup discovery methods. We consider six evaluation measures for multi-class subgroups, four of them new, and study their theoretical properties. We extend the two-class subgroup discovery algorithm CN2-SD to incorporate the new evaluation measures and a new weighting scheme inspired by AdaBoost. We demonstrate the usefulness of multi-class subgroup discovery experimentally, using discovered subgroups as features for a decision tree learner. Not only is the number of leaves of the decision tree reduced with a factor between 8 and 16 on average, but significant improvements in accuracy and AUC are achieved with particular evaluation measures and settings. Similar performance improvements can be observed when using naive Bayes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Advantages of Seed Examples in First-Order Multi-class Subgroup Discovery

Subgroup discovery is halfway between predictive and descriptive rule learning: while there is a target concept, the goal of subgroup discovery is not necessarily to achieve high accuracy in predicting the target, but rather to identify subsets of the population whose class distribution is significantly different from the overall distribution. The target concept helps us to achieve a trade-off ...

متن کامل

A Condensed Representation of Itemsets for Analyzing Their Evolution over Time

On Structured Output Training: Hard Cases and an Efficient Alternative p. 7 Spares Kernel SVMs via Cutting-Plane Training p. 8 Hybrid Least-Squares Algorithms for Approximate Policy Evaluation p. 9 A Self-training Approach to Cost Sensitive Uncertainty Sampling p. 10 Learning Multi-linear Representations of Distributions for Efficient Inference p. 11 Cost-Sensitive Learning Based on Bregman Div...

متن کامل

First-Order Multi-class Subgroup Discovery

Subgroup discovery is concerned with finding subsets of a population whose class distribution is significantly different from the overall distribution. Previously subgroup discovery has been predominantly investigated under the propositional logic framework. This paper investigates multi-class subgroup discovery in an inductive logic programming setting, where subgroups are defined by conjuncti...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

Novel Techniques for Efficient and Effective Subgroup Discovery

Large volumes of data are collected today in many domains. Often, there is so much data available, that it is difficult to identify the relevant pieces of information. Knowledge discovery seeks to obtain novel, interesting and useful information from large datasets. One key technique for that purpose is subgroup discovery. It aims at identifying descriptions for subsets of the data, which have ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009